6 research outputs found

    Controlling speculative execution through a virtually ordered memory system

    Get PDF
    Processors which extract parallelism through speculative execution must be able to identify when mis-speculation has occurred. The three places where mis-speculation can occur are register accesses, control flow prediction and memory accesses. Controlling register and control flow speculation has been well studied, but no scalable techniques for identifying memory dependence violations have been identified. Since speculative execution occurs out of order this requires tracking the causal order, as well as the addresses of memory accesses. This thesis uses simulations to investigate tracking the causal order of memory accesses using explicit tags known as virtual timestamps, a distributed and scalable method. Realizable virtual timestamps are necessarily restricted in length and it is demonstrated that naive allocation schemes seriously constrain execution by inefficiently allocating virtual timestamps. Efficiently allocating virtual timestamps requires analysis of the number required by each section of code. Basic statically and dynamically evaluated analysis methods are established to avoid virtual timestamp allocation becoming a resource bottleneck. The same analysis is also used to efficiently allocate state saving resources in a fixed hardware order. The hardware order provides an alternative way of maintaining the causal order using a simple hardware organization. The ability to predict the resources required by regions of code is used as a way of selecting instructions to execute speculatively. This enables resources to be allocated efficiently and is shown to allow large amounts of parallelism to be extracted. It also promotes the effectiveness of speculative execution by issuing less instructions that will ultimately be rolled back. Using a hierarchy of hardware ordering modules, themselves ordered by explicit virtual timestamps, a scalable ordering system is proposed. This hierarchy forms the basis of a twisted memory system, a multiple version memory system capable of identifying speculative memory dependence violations. The preliminary investigations presented here show that twisted memory has the potential to support aggressive speculative parallel execution. Particular attention is paid to memory bandwidth requirements

    Applying Time Warp to CPU Design

    No full text
    This paper exemplifies the similarities in Time Warp and computer architecture concepts and terminology, and the continued trend for convergence of ideas in these two areas. Time Warp can provide a means to describe the complex mechanisms being used to allow the instruction execution window to be enlarged. Furthermore it can extend the current mechanisms, which do not scale, in a scaleable manner. The issues involved in implementing Time Warp in a CPU design are also examined, and illustrated with reference to the Wisconsin Multiscalar machine and the Waikato WarpEngine. Finally the potential performance gains of such a system are briefly discussed. 1. Introduction Computer designers currently face a very interesting set of challenges. The steady increase in the number of transistors on a chip and the speed at which a chip can be clocked has continued its inexorable progress. In 1997, chips with millions of transistors and clock speeds of hundreds of MHz are in routine production an..

    Constraints on Parallelism Beyond 10 Instructions Per Cycle

    No full text
    The problem of extracting InstructionLevel Parallelism at levels of 10 instructionsper clock and higher is considered. Two different architectures which use speculation on memory accesses to achieve this level of performance are reviewed. It is pointed out that while this form of speculation gives high potential parallelism it is necessary to retain execution state so that incorrect speculation can be detected and subsequently squashed. Simulation results show that the space to store such state is a critical resource in obtaining good speedup. To make good use of the space it is essential that state be stored efficiently and that it be retired as soon as possible. A number of techniques for extracting the best usage from the available state storage are introduced. Keywords: instruction level parallelism, speculation 1 Introduction Increasingly computer architects and system designers are seeking to extract more computer performance by making use of parallelism. There are a number of ..

    Space Constraints on High Levels of ILP

    No full text
    . ILP is one way of effectively using the large number of transistors available on modern CPUs. Two different architectures which use speculation on memory accesses to do this are reviewed. While this form of speculation gives high potential parallelism, it is necessary to retain execution state so that the incorrect speculation can be detected and subsequently squashed. It is shown by theoretical arguments and simulation that the space to store such state is a critical resource in obtaining good speedup. The state must be stored efficiently and retired as soon as possible. It is also shown that larger problem sizes may achieve lower extracted parallelism, despite having a higher potential parallelism. 1 Introduction Increasingly computer architects and system designers seek to extract more computer performance by making use of parallelism. There are a number of ways of approaching this. This paper considers the problem of extracting instruction level parallelism (ILP), that is, paral..
    corecore